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The Firmicutes represent a major component of the intestinal microflora. The intestinal Firmicutes are 
a large, diverse group of organisms, many of which are poorly characterized due to their anaerobic 
growth requirements. Although most Firmicutes are Gram positive, members of the class 
Negativicutes, including the genus Veillonella, stain Gram negative. Veillonella are among the most 
abundant organisms of the oral and intestinal microflora of animals and humans, in spite of being 
strict anaerobes. In this work, the genomes of 24 Negativicutes, including eight Veillonella spp., are 
compared to20 other Firmicutes genomes; a further 101 prokaryotic genomes were included, cover- 
ing 26 phyla. Thus a total of 145 prokaryotic genomes were analyzed by various methods to investi- 
gate the apparent conflict of the Veillonella Gram stain and their taxonomic position within the 
Firmicutes. Comparison of the genome sequences confirms that the Negativicutes are distantly relat- 
ed to Clostridium spp., based on 16S rRNA, complete genomic DNA sequences, and a consensus 
tree based on conserved proteins. The genus Veillonella is relatively homogeneous: inter-genus pair- 
wise comparison identifies at least 1,350 shared proteins, although less than half of these are found 
in any given Clostridium genome. Only 27 proteins are found conserved in all analyzed prokaryote 
genomes. Veillonella has distinct metabolic properties, and significant similarities to genomes of 
Proteobacteria are not detected, with the exception of a shared LPS biosynthesis pathway. The clacle 
within the class Negativicutes to which the genus Veillonella belongs exhibits unique properties, 
most of which are in common with Gram-positives and some with Gram negatives. They are only 
distantly related to Clostridia, but are even less closely related to Gram-negative species. Though the 
Negativicutes stain Gram-negative and possess two membranes, the genome and proteome analysis 
presented here confirm their place within the (mainly) Gram positive phylum of the Firmicutes. Fur- 
ther studies are required to unveil the evolutionary history of the Veillonella and other Negativicutes. 



Background 



The genus Veillonella, belonging to Negativicutes, 
consists of anaerobic, non-fermentative, Gram- 
negative cocci, that are normally observed in pairs 
or short chains, and are non-sporulating and non- 
motile [1]. Veillonella spp. are abundant in the hu- 
man microbiome and are found in the oral, respira- 
tory, intestinal and genitourinary flora of humans 
and animals; they can make up as much as 10% of 
the bacterial community initially colonizing the 
enamel [2] and are found throughout the entire 
oral cavity [3], especially on the tongue dorsum and 
in saliva [4]. The importance of Veillonella spp. in 



human infections is uncertain, and they are gener- 
ally considered to be of low virulence. Veillonella 
form biofilms, often with Streptococcus spp., and 
species of these genera have been found to be more 
abundant in the oral microflora of people with poor 
oral health [5]. Studies have shown that during 
formation of early dental plaque, the fraction of 
Veillonella spp. changes in mixed-microbial colo- 
nies with streptococci [6]. Thus, Veillonella spp. 
may play a role in caries formation as they utilize 
the lactic acid produced by the organisms condu- 
cive to caries [7]. Veillonella are also among the 
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most common anaerobic species reported from 
pulmonary samples and are frequently recovered 
from cystic fibrosis cases [8]. The organisms are 
also abundant in the human gut flora, where their 
numbers were found to be higher in children with 
type I diabetes compared to healthy controls [9]. 
Currently, 12 species of Veillonellct have been char- 
acterized [10,11] including V. parvulct, V. atypica 
and V. dispar, which are found in the human oral 
cavity. 

The Negativicutes are the only diderm (literally 
'two skins') members of the phylum Firmicutes as 
they possess an inner and an outer membrane. 
Their placement within the Firmicutes has been 
widely accepted, and has been confirmed by 16S 
rRNA analysis [12]. However, their genomes have 
not been analyzed in detail to confirm their taxo- 
nomic position. This work presents a broad analy- 
sis of the Negativicutes with focus on the 
Veillonellct spp. using comparative microbial 



genomics. A total of 24 genomes from the 
Negativicutes were compared to 121 genomes 
covering most of the taxonomic span of sequenced 
bacterial genomes. We investigated how the 
Negativicutes genomes compared to other bacteri- 
al genomes using three different and complemen- 
tary approaches: 1) phylogenetic trees to visualize 
the relative distance of the Negativicutes genomes 
to other genomes; 2) amino acid composition, nu- 
cleotide tetramer frequency and metabolism anal- 
ysis using 2-D clustering and heatmaps to com- 
pare genomes; and 3) proteomic comparison 
across the Negativicutes genomes. 

Materials and Methods 

Genome sequences used for analysis 

The set of 145 genomes included in this study (24 
Negativicutes genomes and 121 other prokaryotic 
genomes covering 26 phyla) are listed in Table 1. 



Table 1. Genomes used in this study 



rfiy IUITI 


ixiarne ot organism ana strain 


airain ciesignduon 


i ype sirain 






A c ido bacte ria 


Acidobacterium capsuiatum 


ATrr ci 1 nt 
a i d i i yo 


Vac 

Yes 


z 4UU I J 


z OUOJ 


Acidobactena 


Korebacter versatiles 


Ellin 345 




204669 


1 5771 


Acidobacteria 


"Solibacter uskatus" 


Ellin6076 




234267 


12638 


Actinobacteria 


Bifidobacterium bifidum 


31 7B 


No 


1681 


42863 


Actinobacteria 


Catenulispora acidiphila 


I D1 39908, DSM 44928 


Yes 


479433 


21085 


Actinobacteria 


Cory ne bacte hum 
pseudotuberculosis 


C231 


No 


681 645 


40875 


Actinobacteria 


Segniliparus rugosus 


ATCC BAA-974 


Yes 

Name not 


679197 


40685 


Actinobacteria 


Streptomyces bingchenggensis 


BCW-1 


validly 
published 


749414 


46847 


Actinobacteria 


Tropheryma whipplei 


Twist 


Yes 


203267 


95 


Aquificae 


Persephonella marina 


EX-H1 


Yes 


123214 


12 52 6 


Aquificae 


Sulfurihydrogenibium sp. 


Y03AOP1 


No type 

strain 

available 


436114 


18889 


Aquificae 


Thermocrinis albus 


HI 11/12, DSM 14484 


Yes 


638303 


372 75 


Bacteroidetes 


Bacteroides thetabtaomicron 


VP I -5482 


Yes 


226186 


399 


Bacteroidetes 


Candidatus Sulcia muelleri 


DMIN 




641892 


37785 


Bacteroidetes 


Chitinophaga pinensis 


UQM2034, DSM 2 588 


Yes 


485918 


2 7951 


Bacteroidetes 


Paludibacter propionicigenes 


WB4, DSM 1 7365 


Yes 


69442 7 


42009 


Chlamydiae 


Protochlamydia amoebophila 


UWE25 


Yes 


264201 


10700 


Chlamydiae 


Chlamydia trachomatis 


E/Sweden2 


No 


634464 


43167 


Chlamydiae 


Chlamydophila pneumoniae 


AR39 


No 


115711 


247 


Chlamydiae 


Waddlia chondrophila 


WSU 86-1044 


Yes 


716544 


43761 
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Table 1. Genomes used in this study (cont.) 



Phylum 


Name of organism and strain 


Strain designation 


Type strain 


NCBI Taxon ID 


NCBI Project ID 








Name not 
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validly 
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3401 77 


13921 


Chlorobi 


Chlorobium tepidum 


TLS 


Yes 


1 94439 


302 


Chloroflexi 


Chloroflexus aggregans 


DSM 9485 


Yes 


32 642 7 


16708 


Chloroflexi 


Dehalococcoides sp 


BAV1 


No 


2 1 6389 


15770 


Chloroflexi 


Herpetosiphon aurantiacus 


ATCC 23779 


Yes 


31 62 74 


16523 


Chloroflexi 


Roseiflexus sp. 


RS-1 


No type 
strain 

d V d 1 1 CU / IC 


357808 


16190 
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PCC 79.11 


INO 


Hy/ yOD 


Z OJJJ 


Cyan ob acteria 


Prochlorococcus marinus 


MIT9301 


No 


167546 


15746 


Cyanobacteria 


Synechocystis sp. 


PCC6803 


No 


1148 


60 


Deferribacteies 


Calditerrivibrio nitroreducens 


Yu37-1, DSM 19672 


Yes 


768670 


49523 


Deferribac teres 


Deferribacter desulfuricans 


SSM1, DSM 14783 


Yes 


197162 


37285 


Deferribacteres 


Denitrovibrio acetiphilus 


N2460, DSM 12809 


Yes 


522 772 


29431 


Deinococcus- 
Thermus 


Oceanithermus profundus 


506, DSM 14977 


Yes 


670487 


40223 


LyCll /L/C L/LL Ub 

Therm us 


Thermus thermophilus 


HB8 


Yes 


300852 


132 02 


Deinococcus- 
Thermus 


Truepera radbvictrix 


RQ-24, DSM 17093 


Yes 


649638 


38371 


Dictyoglomi 


Dictyoglomus turgidum 


DSM 6/24 


Yes 


5 1 5635 


29 I /5 


Elusimicrobia 


Elusimicrobium minutum 


Pei 191 


Yes 


445932 


19701 


Fibrobacteres 


Fibrobacter succinogenes 


S85 


Yes 


59374 


32617 


Firmicutes 


Acetohalobium arabaticum 


Z-7288, DSM 5501 


Yes 


574087 


32 769 


Firmicutes 


Acidaminococcus fermentans 


VR4, DSM 20731 


Yes 


591001 


33685 


Firmicutes 


Acidaminococcus sp. 


D21 


No type 
strain 


563191 


34117 






\ /I 1 1 1 r"\ Id 

dVd lldlJ Ic 






Firmicutes 


Alkaliphilus oremlandii 


OhILAs 


Yes 


350688 


16083 


Firmicutes 


Bacillus subtilis subsp. subtilis 


168 


Yes 


224308 


76 


Firmicutes 


Clostridium botulinum 


F Langeland 


No 


441 772 


19519 


Firmicutes 


Clostridium cellulolyticum 


H10 


Yes 


394503 


17419 


Firmicutes 


Clostridium difficile 


630 (epidemic type X) 


No 

Name not 


2 72 5 63 


78 


Firmicutes 


"Desulfotomaculum reducens" 


Ml 1 

1 VI 1 1 


\/ 3 1 1 \ i 
V dl 1 Ul y 

published 


jHy 1 D 1 


1 -1A1 A 


Firmicutes 


Dia lister in vis us 


DSM 15470 


Yes 


592028 


33143 


Firmicutes 


Dialister micraerophilus 


Oral Taxon 843 DSM 
19965 


Yes 


888062 


53029 


Firmicutes 


Dialister micraerophilus 


UPM-345-E 


No 


910314 


59521 


Firmicutes 


Enterococcus faecalis 


V583 


No 


226185 


70 


Firmicutes 


Eubacterium cylindroides 


T2-87 


No 


717960 


45917 


Firmicutes 


Eubacterium rectale 


A1-86, DSM 17629 


No 


39491 


39159 
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Table 1. Genomes used in this study (cont.) 



Phv li i in 
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Firmicutes 


Exiguobacterium sibiricum 


z bb- 1 b 


Yes 


z o2 d4j 


I 064y 


Firmicutes 


Geobacillus kaustophilus 


HTA42 6 


Yes 


235909 


13233 


Firmicutes 


Lactococcus lactis 


cremoris MG1 363 


No 


416870 


18797 


Firmicutes 


Lysinibacillus sphaericus 


C3-41 


No 


4441 77 


19619 


Firmicutes 


Megamonas hypermegale 


ART12/1 


No 


158847 


39163 








No type 






Firmicutes 


• t i 

Megaspnaera genomo sp. 


type 1 28L 


strain 


6992 1 8 


42553 
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InO 
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i es 


9UUOJ9 
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Firmicutes 


Paenibacillus sp. 


J UK-z 


INO 


JZ4U3/ 


9 H9QQ 

z ujyy 


Firmicutes 


Phascolarctobacterium sp. 


YIT 12067 


No 


62 6939 


48505 


Firmicutes 


Selenomonas artemidis 


F0399 


No 


749551 


472 77 


Firmicutes 


Selenomonas flueggei 


ATCC 43531 


Yes 


638302 


372 73 


Firmicutes 


Selenomonas noxia 


ATCC 43541 


Yes 


585503 


34641 








No type 






Firmicutes 


Selenomonas sp. 


Oral Taxon 137 F0430 


strain 


879310 


52055 






available 






Firmicutes 


Selenomonas sp. 


Oral Taxon 149 


No type 


864563 




67H29BP 


strain 


50535 








available 






Firmicutes 


Selenomonas sputigena 


DSM 20758 


Yes 


5462 71 


51247 


Firmicutes 


Staphylococcus aureus aureus 


ED98 


No 


681288 


39547 


Firmicutes 


Streptococcus pneumoniae 


TIGR4 


No 


170187 


2 77 








Name not 
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Thermoanaerobacter sp. 


X514 


1 ■ II 

validly 


39972 6 


16394 
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Firmicutes 


Veillonelia atypica 


-u4y-v-ocnb 


INO 


ODD/ / D 


9 I U/ J 


Firmicutes 


Veillonella atypica 


ACS-134-V-Col7a 


No 


866778 


51079 


Firmicutes 


Veillonelia dispar 


ATCC 1 7748 


Yes 


5462 73 


30491 


Firmicutes 


Veillonella parvula 


ATCC 1 7745 


No 


686660 


41557 


F irm ir i it& c 


Veillonella parvula 


Tp^ DSM 9 DDR 


Vpc 

I CD 




9 1 DQ1 








Name not 
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3 1 44 


val in v 
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45741 6 


41 975 
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Name not 






Firmicutes 


Veillonella sp. 


6 1 27 


validly 


450749 


41977 






published 












Name not 






Firmicutes 


Veillonella sp. 


Oral Taxon 158 F0412 


validly 


879309 


52053 






published 






Fusobacteria 


Fusobacterium nucleatum 


ATCC 25586 


Yes 


1 90304 




nucleatum 


295 


Fusobacteria 


llyobacter polytropus 


CuHBul, DSM 2 92 6 


Yes 


572544 


32577 


Fusobacteria 


Leptotrichia buccalis 


C-1013-b, DSM 1135 


Yes 


523794 


29445 


Fusobacteria 


Sebaldella termitidis 


NCTC 11300 


Yes 


526218 


29539 
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Table 1. Genomes used in this study (cont.) 



Phylum 


Name of organism and strain 
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Acinetobacter baumannii 


SDF 


No 


5091 70 


13001 


Proteobacteria 


Alkalilimnicola ehrlichii 


MLHE-1 


Yes 


1 872 72 


15763 


Proteobacteria 


Arcobacter nitrofigilis 


DSM 72 99 


Yes 


572480 


32593 


Proteobacteria 


Burkholderia xenovorans 


(fungorum) LB400 


Yes 


2 662 65 


254 


Proteobacteria 


Campylobacter jejuni 


doylei 269.97 


No 

\.i mo nAt 


360109 


17163 


Proteobacteria 


Candidatus Pelagibacter ubique 


SAR11 HTCC1062 


validly 


335992 


13989 






published 
Name not 






Proteobacteria 


Candidatus Zinderia insecticola 


CARI 


validly 

r"\ i ir\ 1 1 C r~\ A f~\ 

p UU II SlieU 


871271 


51243 
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Cellvibho japonicus 
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9 P.19 q 
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Proteobacteria 


Cupriavidus taiwanensis 
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1 R711 


Proteobacteria 


Escherichia coli 
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DIM H D 


99 ^ 
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Proteobacteria 


Ceobacter uraniireducens 


Rf4 


Yes 


351605 


15768 


Proteobacteria 


Hahella chejuensis 


KCTC2396 


Yes 


34952 1 


16064 


Proteobacteria 


Haliangium ochraceum 


SMP-2, DSM 14365 


Yes 


502025 


28711 


Proteobacteria 


Helicobacter pylori 


908 


No 


86972 7 


50869 


Proteobacteria 


Lawsonia intracellulars 


PHE/MN1-00 


No 

Name not 


3632 53 


183 


Proteobacteria 


Magnetococcus sp. 


MC-1 


validly 


156889 


262 






published 






Proteobacteria 


Methylobacterium nodulans 


ORS2060 


Yes 


4602 65 


20477 


Proteobacteria 


Neisseria meningitidis 


Z2491 


No 


122587 


252 


Proteobacteria 


Neorickettsia sennetsu 


Miyayama 


Yes 


222891 


357 


Proteobacteria 


Nitrosomonas eutropha 


C91 (C71) 


Yes 


335283 


13913 


Proteobacteria 


Photorhabdus lummescens 
laumondii 


TT01 


Yes 


2 432 65 


9605 


Proteobacteria 


Polynucleobacter necessarius 


C.TIR1 
o 1 1 1\ I 


Mn 


^JZ DjO 


1 QQQ1 


Proteobacteria 


Pseudomonas aeruginosa 


1 CCD CO 


Kin 
1 NO 


^7799 


11 1 A1 
-J I I U I 


Proteobacteria 


Pseudomonas fluorescens 


CR\A/9 q 
O D VVZ 3 


INO 


Z 1 DJ3J 


11 9 9 Q 
J IZZ " 


Proteobacteria 


Pseudomonas stutzeri 


a 1 qm 

A I JU I 


INO 


17Q7 11 


1 AP.1 7 
I OO I / 


Proteobacteria 


Salmonella enterica enterica 


PTzl P1 9 "il DQ 


1 > u 




JUUu/ 


Proteobacteria 


Shewanella oneidensis 


MP 1 
l\\ l\- I 


t es 


9 11 CQf, 
Z I I JOD 


11^ 


Proteobacteria 


Sorangium cellulosum 


jU LcjD 


Kin 
INO 




9 R1 1 1 
ZOI I I 


Proteobacteria 


Stigmatella aurantiaca 


VJ V V *T / J I 


Mn 

1 N U 


J/ OOUO 


<^9 

JZ JD I 


Proteobacteria 


Sulfurospirillum deleyianum 


^17^ nc.M Mdf\ 


Mn 


q9 qpno 
JZ JO^O 


9 Q^9 Q 


Proteobacteria 


Vibrio cholerae 


0395 


No 


345073 


32 853 


Spirochaetes 


Borrelia turicatae 


Q1 C 1 1C 

y 1 1 1 3d 


Vac 

Yes 


J 1 4/ z 4 


1 1CQ7 


Spirochaetes 


Brachyspira murdochii 


56-150, DSM 12563 


Yes 


526224 


29543 


Spirochaetes 


Leptospira interrogans 


lai 56601 


No 


189518 


293 


Synergistetes 


Thermanaerovibrb 
acidaminovorans 


Su883, DSM 6589 


Yes 


525903 


29531 
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Table 1. Genomes used in this study (cont.) 



Phylum 



Name of organism and strain Strain designation 



Type strain NCBI Taxon ID NCBI Project ID 



Tenericutes Acholeplasma laidlawii 
Tenericutes Candidatus Phytoplasma asteris 



PG-8A 



Tenericutes 

Tenericutes 

Tenericutes 

Thermotogae 

Thermotogae 

Thermotogae 

Thermotogae 

Verrucomicrobia 

Verrucomicrobia 

Crenarchaeota 

Crenarchaeota 

Euryarchaeota 



Mycoplasma genitalium 
Mycoplasma pneumoniae 
Ureaplasma parvum 
Fervidobacterium nodosum 
Kosmotoga olearia 
Petrotoga mobilis 
Thermotoga naphthophila 
Akkermansia muciniphila 
Opitutus terrae 
Sulfolobus solfataricus 
Thermosphaera aggregans 
Halogeometricum borinquense 



Euryarchaeota Methanocella sp. 



Euryarchaeota Methanothermus fervidus 



Korarchaeota 



Candidatus Korarchaeum 
cryptofilum 



yellows witches' -b room 
AY-WB 322 098 



Tenericutes Candidatus Phytoplasma mali AT 



G37 
FH 

sv 3, ATCC2 7815 

Rt17-B1 

TBF 19.5.1 

SJ95 

RKU-10 

ATCC BAA-835 

P2 

M11TL, DSM 11486 
PR3, DSM 11551 

RC-I 

V24S, DSM 2088 
OPF8 



Nanoarchaeota "Nanoarchaeum equitans" Kin4-M 



No 

Name not 

validly 

published 

Name not 

validly 

published 

Yes 
No 
No 
Yes 
Yes 
Yes 
Yes 
Yes 
Yes 

Yes 
Yes 

Name not 

validly 

published 

Yes 

Name not 

validly 

published 

Name not 

validly 

published 



441 768 



13478 



37692 

2432 73 

722438 

505682 

381 764 

52 1 045 

403833 

590168 

349741 

PB90-1 

2 73057 

633148 

469382 

351160 
523846 

374847 



228908 



19259 



25335 
97 

4952 5 

19087 

16719 

29419 

17679 

33663 

20089 

452637 

108 

36571 

20743 

19641 
33689 

16525 



9599 



1 6S rRNA tree 

For this analysis, 16S rRNA sequences were predict- 
ed from the whole genome sequences of the selected 
organisms, using the RNAmmer algorithm [13]. The- 
se sequences were aligned using the MAFFT pro- 
gram, with the iterative refinement algorithm using 
maximum iteration (1000) and default parameters 
for gap penalties [14]. A distance tree was con- 
structed using MEGA5 [15] with the Neighbor- 
joining algorithm [16] and 1,000 bootstrap re- 
samplings. The taxa in the resulting tree were col- 
lapsed to phyla, except for the Negativicutes. 

Composition Vector Tree (CV) 

A Composition Vector Tree was constructed based 
on protein sequences of the 145 selected genomes 
using a webserver (available at tlife.fudan.edu.cn/ 
cvtree) with the K parameter set at 6 [17]. The 



outcome from the program is a distance matrix 
based on amino acid sequence comparisons, which 
is then used to generate a phylogenetic tree with 
the neighbor-joining method. In the shown tree, the 
outgroup chosen was Methanothermus fervidus (an 
Archaed). After tree visualization with MEGA5, 
branches were collapsed wherever possible with 
the exception of the Negativicutes branch, which 
remained expanded. 

Consensus tree of conserved genes 

Using the list of universally conserved core genes, 
previously identified by Ciccarelli et al. [18], and 
an implementation of BLAST, a set of genes that 
was shared among all 145 genomes was identified. 
Proteins that had no match in at least one genome 
or showed poor E-value were eliminated. The 27 
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conserved core genes were extracted (Table 1) 
and a multiple alignment was produced using 
MUSCLE software [19]. A set of phylogenetic trees 
was constructed by PAUP [20] and a best-fit con- 
sensus tree was generated using Phylogeny Infer- 
ence package (PHYLIP) as described elsewhere 
[21]. Bootstrap values were found after 27 re- 
samplings, which is equal to the number of gene 
families conserved in all the analyzed genomes. 

DNA tetramer analysis and amino acid usage 

A tetramer frequency heatmap was constructed 
from the observed ratios of tetra-nucleotide fre- 
quencies divided by estimated tetra-nucleotide 
frequencies for each genome [22]. The estimated 
tetra-nucleotides were computed from the ge- 
nomes' base composition. The ratio of observed 
over expected frequency was used for hierarchical 
clustering using complete linkage and Euclidean 
distance, which was subsequently performed with 
respect to both strain and tetramer frequencies. 

The amino acid heatmap is based on frequencies of 
deduced proteomic amino acids from each genome 
normalized with respect to the total number of ami- 
no acids in each genome. The amino acid frequencies 
for each genome were clustered using complete 
linkage and Euclidean distance with respect to both 
genomes and amino acids. The heatmap was made 
using the R package ggplot2 [23]. 

Comparison of metabolism potential 

The protein sequences of Kyoto Encyclopedia of 
Genes and Genomes (KEGG) orthology categories 
[24] were downloaded and only the Bacterial se- 
quences were considered. The Hidden Markov 
model (HMM) of each ortholog was generated us- 
ing HMMER version 3 [25] based on the multiple 
alignment of each orthologous set of KEGG pro- 
teins, using MUSCLE software [19]. The 145 pro- 
teomes were queried against the HMMs to infer 
their ontology. A cutoff of lxl(T 30 was used for 
statistical significance. A heatmap of each pathway 
and process derived from the database KEGG was 
illustrated based on normalized abundance of the 
enzymes present in each pathway. The heatmap 
and hierarchical clustering were performed in the 
software R [23]. 

Construction of BLAST matrix and proteome 
comparison 

Reciprocal BLAST was performed between each ge- 
nome pair. The program blastall version 2.2.25 was 
used for BLAST implementation using default 



settings (BLASTp, E-value set to lxlO" 5 for non- 
homologs and lxlO" 8 for homologs, without 
filtering). A hit was considered significant at a 
BLAST cutoff of 95% identity and 95% coverage (of 
the longest gene in comparison). The number of hits 
was then given as a percentage of the genes in the 
column representing the corresponding genome. 
The diagonal designates internal homologs, comput- 
ed by blasting each genome with itself. To avoid in- 
cluding identical genes, the second highest scoring 
hits were used. Furthermore, we also performed 
homology reduction of the diagonal hits, using an 
implementation of the Hobohm algorithm [26]. 

Results 

Twenty-four Negativicutes genomes were com- 
pared to 121 other prokaryotic genomes covering 
22 Bacterial and 4 Archaeal phyla. When available, 
at least two genomes were included for every phy- 
lum. The first analysis presented here is based on 
16S rRNA alignments. A single 16 S rRNA gene was 
extracted from each of the genomes and an align- 
ment was produced spanning the maximum 
length of the gene. A phylogenetic tree was con- 
structed based on this alignment, as shown in Fig- 
ure 1. With the exception of the Negativicutes, 
branches of the tree were collapsed in those cases 
where the analyzed species within a phylum clus- 
tered together. With the exception of some 
Firmicutes, the analyzed genomes cluster accord- 
ing to their phylum, although the Deferribacteres 
phylum is mixed with the Proteobacteria phyla, 
and two members of Proteobacteria are not posi- 
tioned with other members of their phylum 
[Lawsonia intracellulars and Magnetococcus). 
That most phyla could be collapsed is consistent 
with the weight of 16S rRNA similarities in cur- 
rently accepted taxonomic descriptions of prokar- 
yotes. The Firmicutes, however, show less con- 
sistency. Although most of the analyzed Firmicutes 
cluster together, two species are separated from 
the Firmicutes branch [Eubacterium cylindroides 
and Thermoanaerobacter sp., both members of 
Clostridia). The Negativicutes are positioned with- 
in the Firmicutes cluster, and this part of the tree 
is expanded in the figure for clarity. As can be 
seen, phylogeny of the 16S rRNA gene provides 
good resolution between the different genera of 
the analyzed Negativicutes. All Veillonella spp. are 
clustered within one branch of the Negativicutes. 
The Acidaminococcaceae (to which 
Phascolarctobacterium spp. also belong) are 
placed within the cluster of the Veillonellaceae, in 
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accordance with their current classification [27]. 
The Acidaminoooccaceae used to be recognized as 
a separate family within the Negativicutes, just like 
the Veillonellaceae, and during preparation of this 
contribution these two families were presented as 
such in the Taxonomy database at NCBI. Of note is 
the relatively close relationship between 
Negativicutes and two Clostridium species [C. 
botulinum and C. cellulolyticum), which does not 



cluster with other members of the Clostridium ge- 
nus (Figure 1). That genus displays a high degree 
of variation and re- classification of some of the 
members of this genus is in progress (see for ex- 
ample [27]). That two members of the Clostridia 
are even placed outside the Firmicutes phylum is 
an indication of 16S rRNA gene sequence hetero- 
geneity within this class. 




Firmicutes 

Negativicutes 

Veillonella spp 

Figure 1. Phylogenetic neighbor-joining tree based on 16S rRNA genes extracted from 145 ge- 
nomes (24 Negativicutes and 121 prokaryotic genomes representing 26phyla). Bootstrap values of 
50 and higher are indicated. With the exception of the Negativicutes, branches where all organ- 
isms belong to the same phyla are collapsed and named by the phyla they represent. The green 
shading indicates the position of Firmicutes. The collapsed branch of the Bacilli, marked (1), con- 
tains Turicibacter sanguinis, a Firmicutes member of the Erysipelotrichales as well as Bacilli mem- 
bers. An uncollapsed tree is included in the supplementary material. 
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Next, all protein-coding genes of the analyzed ge- 
nomes were compared and a composition vector 
tree (CVtree) was produced, based on amino acid 
sequences (Figure 2). The topology of the result- 
ing tree is generally in accordance with the 16S 
rRNA tree shown in the previous figure. As indi- 
cated by the collapsed branches, the CVtree 
grouped most genomes according to their known 
taxonomic phyla, although not all Spirochaetes 
cluster together. In contrast to the 16S rRNA tree, 
in this protein tree all the Firmicutes cluster to- 
gether, and are distinct from other phyla. The 
Negativicutes genomes, nested within the 
Firmicutes, again have the Acidaminococcaceae 



placed within the Veillonellaceae, while all 
Veillonella spp. are found in one cluster. All Clos- 
tridia, this time divided into two collapsed 
branches, are positioned as the closest relatives to 
Negativicutes. It is of interest that among the clos- 
est relatives to Firmicutes, based on this analysis, 
are the Fusobacteria and the Elusimicrobia; these 
are atypical diderm bacteria that produce lipopol- 
ysaccharides [28]. However, the spirochete, 
Brachyspira murdochii, does not possess two 
membranes, but is nevertheless grouped with 
atypical diderms. On the other hand while the 
Synergistetes are atypical diderm bacteria, they 
are placed elsewhere in the tree (Figure 2). 



/// 




ThermosinuscatboxydivoransNor! 

Megamonas hypermegale ART12/1 
Mitsuokel/a mulracida DSM 20SH 



•99 



13? 



f 49. 



Figure 2. Phylogenetic tree based oncomposition vector analysis (CVtree) of all protein coding genes 
(amino acid sequences) derived from the analyzed genomes. Note that the branch lengths in this plot 
are artificial. The coloring is the same as in Figure 1 and branches have been collapsed. The 
Firmicutes branch Bacilli, marked (1), contains Turicibacter sanguinis. An uncollapsed tree is included 
in the supplementary material. 
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A third analysis was based on a subset of proteins 
found conserved amongst all analyzed genomes. 
These conserved proteins were selected based on a 
protein BLAST (a cutoff of 50% identity and 50% 
coverage of the query length was used) and single 
linkage clustering The analysis identified 29 genes 
that are shared among all 145 genomes [Table 2]. A 
consensus tree was constructed based on these 29 
conserved proteins (Figure 3). The results confirm 
the global observations of the other two 

Table 2. Universally conserved COGs 



phylogenetic analyses: the Negcttivicutes cluster 
together and are most closely related to Clostridia 
(in this case the most closely related species are 
Desulfotomaculum reducens and Acetohalobium 
arabaticum). As before, the Acidaminococcaceae 
cluster together but within the Veillonellaceae. The 
position of Turicibacter sanguinis within the Bacilli 
group of Firmicutes is consistent with the other two 
trees but contrasts with its taxonomic description 
at NCBI as a member of the Erysipelotrichia. 



Group Average length (aa) Annotation 



COG0012 
COG0016 
COG0048 
COG0049 
COG0052 
COG0080 
COG0081 
COG0087 
COG0091 
COG0092 
COG0093 
COG0094 
COG0096 
COG0097 
COG0098 
COG01 00 
COG01 02 
COG01 03 
COG01 72 
COG01 84 
COG01 86 
COG01 97 
COG0200 
COG0201 
COG0202 
COG0256 
COG0495 
COG0522 
COG0533 



380 Predicted GTPase, probable translation factor 

423 Phenylalanine-tRNA synthethase alpha subunit 

137 Ribosomal protein S12 

182 Ribosomal protein S7 

240 Ribosomal protein S2 

154 Ribosomal protein L1 1 

230 Ribosomal protein L1 

288 Ribosomal protein L3 

157 Ribosomal protein L22 

240 Ribosomal protein S3 

130 Ribosomal protein L1 4 
182 Ribosomal protein L5 

131 Ribosomal protein S8 

177 Ribosomal protein L6P/L9E 
220 Ribosomal protein S5 

145 Ribosomal protein S1 1 

167 Ribosomal protein L13 

172 Ribosomal protein S9 

442 Seryl-tRNA synthetase 

154 Ribosomal protein S15P/S13E 

122 Ribosomal protein S1 7 

175 Ribosomal protein L16/L10E 

166 Ribosomal protein L15 

445 Preprotein translocase subunit SecY 

32 3 DNA-directed RNA polymerase, alpha subunit 

178 Ribosomal protein L1 8 
854 Leucyl-tRNA synthetase 

199 Ribosomal protein S4and related proteins 

375 Metal-dependent proteases with chaperone activity 
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Figure 3. Consensus tree based on the phylogenetic trees of 27 genes conserved in all 145 genomes. 
The collapsed branch of the Bacilli, marked (1), contains Turicibacter sanguinis. An uncol lapsed tree is 
available as a supplemental figure. 



In conclusion, based on three independent phylo- 
genetic analyses, the closest relatives to the 
Negativicutes seem to be the Clostridiaceae. The 
observed clustering of species within the 
Negativicutes is consistent with their assigned 
taxonomy. Furthermore, these analyses show that 
Veillonella spp. form a distinct branch, most differ- 
ent from the other Negativicutes, while the recent 
change of status of the Acidaminococcaceae (they 
are no longer a separate family) is confirmed by 
these analyses. 



Apart from comparing proteins and genes, ge- 
nomes can also be compared based on nucleotide 
composition irrespective of their coding capacity. 
For instance, the frequency of nucleotide combina- 
tions can reveal similarities between genomes that 
are independent of protein-coding information. We 
compared the frequency of tetranucleotides for all 
145 genomes. The observed frequency of all 64 
tetranucleotide combinations was extracted for 
each genome and these frequencies were divided 
by the theoretically calculated, expected frequen- 
cies (corrected for differences in base composi- 
tion). This ratio, which could be interpreted as a 



http://standardsingenomics.org 



441 



Veil I one Ha, Firmicutes 



genomic signature, was expected to reflect taxo- 
nomic divisions [29]. However, although the analy- 
sis identified a high similarity in tetranucleotide 
frequency for all of the analyzed Veillonellct ge- 
nomes, most of the clustering observed was not in 
accordance with known taxonomic relationships. 
Not only were Negativicutes other than VeiUonella 
separated from each other and strewn across the 
phyla, but also several other Firmicutes were dis- 
tributed over various branches (data shown as 
supplementary material). In fact, for most of the 
analyzed genomes, members of identical phyla did 
not cluster together and even the Archaea were 
mixed with Bacteria, although some closely related 
species were indeed clustered. This may explain 
why all VeiUonella genomes grouped together. Sev- 
eral organisms with similar tetranucleotide fre- 
quencies did not share a common ecological niche, 
in contrast to previously reported observations 
(reviewed in [30]). Neither was the obtained clus- 
tering dictated by GC-content The conclusion from 
this analysis was that tetranucleotide analysis is 
only taxonomically informative for closely related 
genomes. 



We also compared whole-genome amino acid fre- 
quencies in each of the deduced proteomes. Alt- 
hough the results are slightly more in agreement 
with known taxonomy as compared with the ge- 
nomic signatures discussed above, this analysis 
does not cluster organisms according to their phy- 
la, and again some Archaea are mixed with Bacte- 
ria. The relevant part of the heatmap based on 
amino acid frequency is shown in Figure 4. All 
VeiUonella genomes cluster together within the 
Negativicutes, with the exception of two of the 
three Dialister genomes, which are found most 
closely related to Clostridium species (See supple- 
mental information for a version of this figure 
showing all the genomes). The major Negativicutes 
cluster also contains a Geobacillus (which is a 
Gram-positive Firmicutes) and a methanogenic 
Archaean. Interestingly, the closest relatives to this 
cluster are not Clostridia, as the previous phyloge- 
netic trees suggest, but a number of Proteobacteria. 
It is striking that the amino acid frequency analysis 
detects similarities to Proteobacteria, with which 
the Negativicutes have their two membranes in 
common. 




CWHMPRQNYFALKI STDEVG 



Tropheryma whipplei str Twist 
Megasphaera genomosp type 1 str 26L 
Phascolarctobacterium sp YIT 12067 
Mrtsuokella multacida DSM 20544 
Acidaminococcus sp D21 
Acidaminococcus fermentans DSM 20731 
uncultured methanogenic archaeon 
Dialister invisus DSM 15470 

iphaera micronuciformis F0359 




Thermosinus carboxydivorans Nor1 
Geobacillus Kaustophilus HTA426 
Selenomonas sputigena ATCC 35185 
Selenomonas sputigena 35185 
Selenomonas sp oral taxon 137 sir F0430 
Selenomonas artemidis F0399 
Selenomonas noxia ATCC 43541 
Selenomonas sp 149 str 67H29BP 
Selenomonas flueggei ATCC 43531 
Cellvibrio aponicus Uedal07 
Hahella chejuensis KCTC 2396 
Nitrosomonas eutropha C91 
Salmonel.a enterica sv Enteritidis 
Escherichia coli K12 substr MG1655 
Polynucleobacter necessarius STIR1 
Neisseria meningitidis Z2491 
Akkermansia muciniphila ATCC BAA. 835 
Chlorobium tepidum TLS 
Geobacter uraniireducens R14 



Figure 4. A zoomed heatmap of the amino acid frequency found in the deduced proteomes of all 145 genomes. A 
fragment of the heatmap is shown, presenting the cluster in which all but two Negativicutes are found. The remain- 
ing two, both Dialister microaerophilus genomes, are positioned elsewhere in the tree, closest to Clostridium 
cellulolyticum (not shown in this zoom). The color scale indicates highly underrepresented (orange) to highly 
overrepresented amino acid frequency (magentum). The full figure is available as supplementary information. 
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The metabolic properties encoded by the ge- 
nomes were analyzed next, based on KEGG 
comparisons [24]. The results are again visual- 
ized in a heatmap (Figure 5). We hypothesized 
that this analysis could identify similarities 
based on niche adaptation. For simplicity, only a 
selected number of phyla are shown: apart from 
the Firmicutes, genomes are included that rep- 
resent Bacteroidetes and Proteobacteha (both of 
which contain members frequently found in the 
oral or gut microbiome), while Cyanobacteria 
are included as representatives of a phylum that 
occupy an environmental niche. Since the ge- 
nomes are compared based on predicted prote- 
omes, their annotation was standardized in or- 
der to reduce artificial variation caused by gene 
annotation differences. As can be seen in Figure 
5, the Veillonella genomes all cluster together at 
the right-hand side of the plot, within a larger 
cluster containing most of the other 
Negativicutes and some Firmicutes. The three 
Dialister species are placed outside the 
Negativicutes cluster. The other Firmicutes that 
are found combined with the Negativicutes, 
based on their metabolic potential, are Clostrid- 
ium cellulolyticum, Eubacterium rectale, 
Lactococcus lactis, Streptococcus pneumoniae 
and Turicibacter sanguinis. These are all com- 
mon members of the oral or intestine 
microbiome. As expected, the metabolic path- 
way for lipopolysaccharide biosynthesis is 
shared between the Negativicutes and other 
Gram-negative species, as indicated by the ar- 
rows in Figure 5. Interestingly, the Cyanobacte- 
ria form a small cluster within, not outside the 
tree, together with a Haliangium and a 
Sorangium species as their closest neighbors 
(both are social Myxococcales belonging to the 
Deltaproteobacteria). The exclusive ability of 
carbon fixation by Cyanobacteria is apparent 
from the dark red square in the block 'energy'. 
The lanes of Veillonella in Figure 5 are dominat- 
ed by light colors, indicative of medium meta- 
bolic potential; that is, in contrast to some ge- 
nomes where most of the pathways are present 
(dark red for Proteo bacteria for example) or 
missing (dark green for other Negativicutes), the 
Veillonella genomes have partial pathways 
(based on knowledge primarily from aerobic 
genomes). There is no reason to believe that the 
Veillonella genomes should have less metabolic 
potential than other Negativicutes. Indeed, it is 



likely that the differences in metabolic potential 
of Veillonella are truly reflective of alternative 
capabilities for these bacteria. 

It was further investigated how conserved the 
predicted proteomes are within the 
Negativicutes. As a quantitative measure for 
homology, shared protein-coding genes were 
identified by pairwise BLASTP comparison and 
expressed as a percentage of the combined pro- 
teomes. The results are shown in a matrix (Fig- 
ure 6). In addition to the proteomes of the 24 
Negativicutes, the comparison includes Clostrid- 
ium botulinum, CI. cellulolyticum and 
Desulfotomaculum reducens, as these Firmicutes 
were shown to share characteristics with 
Negativicutes in previous analyses [cf. Figures 1 
and 3). The proteo me of E. coli K12 is included 
as an example of a Gram-negative intestinal bac- 
terium. The BLAST matrix was constructed us- 
ing reciprocal best BLAST hits to determine the 
presence of shared protein family between two 
genomes. Inspection of Figure 6 shows that the 
genus Veillonella is relatively homogeneous; any 
two members of this genus share between 67% 
and 90% homology (1,357 to 1,682 protein fam- 
ilies), irrespective of the species. The genus 
Selenomonas is more heterogeneous, with pair- 
wise homology varying from 42% to 82% be- 
tween any two species (980 to 1659 protein 
families). The three proteomes of Dialister spp., 
covering two species, share between 40% and 
84% homology. The highest homologous frac- 
tion identified between two members of differ- 
ent genera within the Negativicutes is 43% 
[Mitsuokella multacida compared to 
Selenomonas sputigena, whereas the lowest ho- 
mology is 15% [Dialister spp. compared to 
Thermosinus carboxydivorans). Negativicutes 
share between 9% and 33% homology with the 
analyzed Firmicutes, whereas slightly lower ho- 
mology is detected with E. coli (between 7% and 
24%). 

Finally, we assessed the gene pool conserved 
within all analyzed Negativicutes. Using the 
same cutoff for protein BLAST comparison as 
before, a core- genome is identified that contains 
about 300 conserved protein families (data not 
shown). This is a relatively low number of con- 
served proteins, reflective of the extensive ge- 
netic heterogeneity within this bacterial class. 
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Figure 5. Heatmap of metabolism potential, based on Kyoto Encyclopedia of Genes and Ge- 
nomes ontology (KEGG). The green color in the heatmap indicates weak metabolic potential, 
while red signals strong potential. The arrows to the right indicate the scores for lipopolysaccha- 
ride biosynthesis. A version summarizing the metabolism pathways and showing the species 
legend is available as supplementary material. 
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Figure 6. Proteome comparison represented by a BLAST matrix, based on 24 Negativicutes genomes with recip- 
rocal best hits. The genomes of Clostridium botulinum, CI. cellulolyticum, Desulfotomaculum reducens and E. coli 
are added for comparison. Inter-genus comparisons are indicated by black squares. A version reporting the nu- 
merical values of homology percentages is available as supplementary information. 



Discussion 

The availability of complete sequences for a large 
and diverse set of Bacterial genomes has helped in 
exploring the conundrum of the genus Veillonella, 
a genus within the Negativicutes class, all of which 
are Gram negative Firmicutes. The 16S rRNA tree 
shown as Figure 1 illustrates how "close" the 
Negativicutes are to other Firmicutes. The closest 
Gram positive Clostridium species are actually 
quite distant to Veillonella and other Negativicutes 
genomes, as can be seen in the low fraction of 
shared protein families in Figure 6. The Gram- 
negative Firmicutes are even more distant to other 
Gram negatives, such as Proteobacteria (e.g., E. 
coli). It should be noted that the family 
Clostridiaceae is a largely diverse group with 



many members being re-classified [27]. It is there- 
fore possible that the taxonomic description of 
some Clostridium genomes may change in future. 
However, our analyses did not identify one single 
Gram-positive Firmicutes [Clostrida or others) that 
consistently was identified as most closely related 
to Veillonella. As seen from three types of phylo- 
genetic analysis, the Negativicutes class genomes 
form a distinct cluster within the Firmicutes, and 
the Veillonella genus forms a relatively homoge- 
neous group of species within the Negativicutes, 
with relatively conserved metabolic properties 
(Figure 5). In comparison, the Selenomonas genus 
is more heterogeneous, at least based on their to- 
tal gene comparison, as illustrated in Figure 6. 
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In contrast to expectations, relatively little homol- 
ogy between Negativicutes and other Gram- 
negative genomes was detected in our analyses. 
Neither gene-dependent phylogenetic analysis, 
nor gene-independent DNA tetramer analysis 
identified a significant commonness between 
Negativicutes and, say, Proteobacteria. Only 
whole-genome frequency analysis of amino acid 
usage identified some similarity to a few 
Proteobacteria, and this might be more reflective 
of environment the organism is adapted to, and 
not phylogeny. Using KEGG pathways for metabol- 
ic comparison of the proteomes we found few 
pathways in common, with the exception of a 
shared lipopolysaccharide biosynthesis pathway. 
From all analyses combined, it is clear that the 
taxonomic placement of Negativicutes within the 
Firmicutes reflects their genetic and genomic 
characteristics, although the proteins encoded by 



the Negativicutes genomes are quite distinct from 
their Gram-positive cousins. It could be speculated 
that the double membrane of the Negativicutes 
evolved in a lineage that used to be a single- 
membrane (Gram-positive) Firmicute. Whether 
this event co-evolved independently of the for- 
mation of other Gram-negative phyla, or was the 
result of lateral gene transfer, cannot be stated for 
certain at present; estimations of horizontally 
transferred regions in Veillonella parvula DSM 
2008, the only fully assembled Veillonella genome 
available, using the least conservative method on 
the Islandviewer web-site [31], revealed that only 
2% of the genome is of foreign origin. In compari- 
son, 9% of the£". coli K-12 subsp. MG1655 genome 
was predicted as horizontally transferred. Further 
analyses are therefore needed to assess this in 
more detail. 



Author's contributions 

Tammi Vesth was a main contributor to the writ- 
ing of the manuscript and to the organization of 
the work. Trudy Wassenaar helped considerably 
in editing and improving the manuscript. Individ- 
ual contributions: Asli Ozen (16s rRNA and CV 
tree), Oksana Lukjancenko (consensus tree), San- 
dra Andersen (initial investigations and back- 



ground research, early version of the manuscript), 
Rolf Sommer Kaas (BLAST matrix), Jon Bohlin (te- 
tramer and amino acid usage heatmaps), Intawat 
Nookaew (metabolism heatmaps). David Ussery 
provided the original idea for this manuscript, 
suggested the figures, helped in early drafts of the 
manuscript, and supervised the project. 



Acknowledgements 

This research was supported by grants from the Danish 
Research Council, and in part by a grant 09-067103 /DSF 
from the Danish Council for Strategic Research. 



References 

1 . Delwiche EA, Pestka JJ, Tortorello ML. The 
veillonellae: gram-negative cocci with a unique 
physiology. Annu Rev Microbiol 1985; 39:175- 
193. PubMed 

http://dx.doi.Org/1 0.1 1 46/annurev.mi.39. 1 001 85. 
001 135 

2. Diaz PI, Chalmers Nl, Rickard AH, Kong C, Mil- 
burn CL, Palmer RJ, Kolenbrander PE. Molecular 
Characterization of Subject-Specific Oral 
Microflora during Initial Colonization of Enamel. 
Appl Environ Microbiol 2006; 72:2837-2848. 
PubMed 

http://dx.doi.org/10.1128/AEM. 72. 4.2837- 
2848.2006 

3. Aas JA, Paster BJ, Stokes LN, Olsen I, Dewhirst FE. 
Defining the Normal Bacterial Flora of the Oral 
Cavity. / Clin Microbiol 2005; 43:5721-5732. 



PubMed 

http ://dx. doi .org/1 0. 1 1 2 8/ICM.43. 1 1 .5 72 1 - 
5 7 32.2 005 

4. Mager DL, Ximenez-Fyvie LA, Haffajee AD, 
Socransky SS. Distribution of selected bacterial 
species on intraoral surfaces. I Clin Periodontol 
2003; 30:644-654. PubMed 

http ://dx. doi .org/1 0. 1 034/j . 1 600- 
051X.2003. 00376.x 

5. Olson JC, Cuff CF, Lukomski S, Lukomska E, 
Canizales Y, Wu B, Crout RJ, Thomas JG, McNeil 
DW, Weyant RJ, et al. Use of 16S ribosomal RNA 
gene analyses to characterize the bacterial signa- 
ture associated with poor oral health in West Vir- 
ginia. BMC Oral Health 201 1; 11:7. PubMed 
http://dx.doi.org/10.1186/1472-6831-1 1-7 



446 



Standards in Genomic Sciences 



Vesth et al. 



6. Chalmers Nl, Palmer RJ, Cisar JO, Kolenbrander 
PE. Characterization of a Streptococcus sp.- 
Veillonella sp. Community Micromanipulated 
from Dental Plaque. J Bacterbl 2008; 190:8145- 
8154. PubMed 

http://dx.doi.org/10.1128/IB.00983-08 

7. Leuckfeld I, Paster BJ, Kristoffersen AK, Olsen I. 
Diversity of Veillonella spp. from subgingival 
plaque by polyp hasic approach. APMIS 2010; 
118:230-242. PubMed 
http://dx.doi.Org/10.1111/j.1 600- 
0463.2009.02584.x 

8. Tunney MM, Field TR, Moriarty TF, Patrick S, 
Doering G, Muhlebach MS, Wolfgang MC, Bou- 
cher R, Gilpin DF, McDowell A, Elborn JS. Detec- 
tion of anaerobic bacteria in high numbers in spu- 
tum from patients with cystic fibrosis. Am J Respir 
Crit Care Med 2008; 177:995-1001. PubMed 
http://dx.doi.org/10.1164/rccm.200708-1151OC 

9. Murri M, Leiva I, Gomez-Zumaquero JM, 
Tinahones FJ, Cardona F, SoriguerF, Quepo- 
Ortuno Ml. Gut microbiota in children with type I 
diabetes differs from that in healthy children: a 
case-control study. BMC Mec/2013; 11:46. Pub- 
Med http://dx.doi.Org/10.1 186/1 741-7015-1 1 -46 

10. Kolenbrander PE, Moore LVH. The genus 
Veillonella. in:H.G. Balows, M. Truper, W. 
Dworkin, W. Harder, K.H. Schleifer (Eds.), The 
prokaryotes (2nd ed.), Springer, New York (1992), 
pp. 2034-2047. 

11. Mashima I, Nakazawa F. Identification of 

Veillonella tobetsuensis in tongue biofilm by using 
a species-specific primer pair. Anaerobe 201 3; 
22:77_81. 

12. De Vos P, Garrity GM, Jones D, Krieg NR, Ludwig 
W, Rainey FA, Schleifer KH, Whitman WB. Vol- 
ume 3: The Firmicutes. In Bergey's Manual of Sys- 
tematic Bacteriology, Springer 2009. 

1 3. Lagesen K, Hallin P, Rodland EA, Staerfeldt HH, 
Rognes T, Ussery DW. RNAmmer: consistent and 
rapid annotation of ribosomal RNA genes. Nucle- 
ic Acids Res 2007; 35:31 00-31 08. PubMed 
http://dx.doi.Org/1 0.1 093/nar/gkml 60 

14. Katoh K, Toh H. Parallelization of the MAFFT 
multiple sequence alignment program. Bioinfor- 
matics 2010; 26:1899-1900. PubMed 

http ://dx.doi .org/1 0. 1 09 3/bioi nformati cs/btq22 4 

15. Tamura K, Peterson D, Peterson N, Stecher G, 
Nei M, Kumar S. MEGA5: Molecular Evolutionary 
Genetics Analysis using Maximum Likelihood, 
Evolutionary Distance, and Maximum Parsimony 
Methods. Mol Biol fvo/2011; 28:2731-2739. 

http://standardsingenomics.org 



PubMed 

http ://dx.doi .org/1 0. 1 093/mol bev/msrl 2 1 

16. Saitou N, Nei M. The neighbor-joining method: a 
new method for reconstructing phylogenetic 
trees. Mol Biol Evol 1987; 4:406-425. PubMed 

17. Xu Z, Hao B. CVTree update: a newly designed 
phylogenetic study platform using composition 
vectors and whole genomes. Nucleic Acids Res 
2009; 37(suppl 2):W174-W1 78. PubMed 

http ://dx.doi .org/1 0. 1 093/nar/g kp2 78 

18. Ciccarelli FD, Doerks T, von Mering C, Creevey 
CJ, Snel B, Bork P. Toward automatic reconstruc- 
tion of a highly resolved tree of life. Science 
2006; 311:1283-1287. PubMed 

http ://dx.doi .org/1 0.112 6/science. 1 1 2 3061 

19. Edgar RC. MUSCLE: multiple sequence alignment 
with high accuracy and high throughput. Nucleic 
Acids Res 2004; 32:1 792-1 797. PubMed 
http://dx.doi.org/10.1093/nar/gkh340 

20. Fink WL. Microcomputers and phylogenetic anal- 
ysis. 5c/ence 1986; 234:1 1 35-1 1 39. PubMed 
http ://dx.doi .org/1 0.112 6/science.2 34.4780. 1135 

21. Retief JD. Phylogenetic analysis using PHYLIP. 
Methods Mol Biol 2000; 1 32:243-258. PubMed 

22. Karlin S, Burge C. Dinucleotide relative abun- 
dance extremes: a genomic signature. Trends 
Genet 1995; 11:283-290. PubMed 

http ://dx.doi .org/1 0. 1 01 6/S01 68-952 5(00)89076- 
2 

23. Wickham H: ggplot2: Elegant Graphics for Data 

Analysis (Use R!). Springer New York, 2009. 
ISBN-10: 0387981403 | ISBN-13: 978- 
0387981406 

24. Kanehisa M, Goto S, Kawashima S, Okuno Y, 
Hattori M. The KEGG resource for deciphering 
the genome. Nucleic Acids Res 2004; 32:2 77- 
280. PubMed 

http ://dx.doi .org/1 0. 1 093/nar/g kh063 

25. Eddy SR. BIOINFORMATICS REVIEW Profile hid- 
den Markov models. [PubMed]. Bioinformatics 
1998; 14:755-763. PubMed 
http://dx.doi.Org/10.1093/bioinformatics/14.9.755 

26. Hobohm U, Sander C. Enlarged representative set 
of protein structures. Protein Sci 1994; 3:522-524. 
PubMed 

http://dx.doi.org/10.1002/pro.556003031 7 

2 7. Ludwig W, Schleifer KH, Whitman W. Revised 
road map to the phylum Firmicutes. In Bergey's 
Manual of Systematic Bacteriology, Springer 
2009:1-13. 



447 



Veil I one Ha, Firmicutes 



2 8. Gupta RS. Origin of diderm (Gram-negative) bac- 
teria: antibiotic selection pressure rather than en- 
dosymbiosis likely led to the evolution of bacteri- 
al cells with two membranes. Antonie van Leeu- 
wenhoek 2011; 100:171-182. PubMed 
http://dx.doi.org/10.1007/s10482-01 1-961 6-8 

29. Pride DT, Meinersmann RJ, Wassenaar TM, Blaser 
MJ. Evolutionary implications of microbial ge- 
nome tetranucleotide frequency biases. Genome 
Res 2 003; 13:145-1 58. PubMed 
http://dx.doi.Org/1 0.1 1 01/gr.335003 



30. Dutta C, Paul S. Microbial lifestyle and genome 
signatures. Curr Genomics 2012; 13:1 53-162. 
PubMed 

http://dx.doi.org/10.2174/138920212 799860698 

31. Langille MG, Brinkman FS. IslandViewer: an inte- 
grated interface for computational identification 
and visualization of genomic islands. Bioinformat- 
ics 2009; 25:664-665. PubMed 
http://dx.doi.org/10.1093/bioinformatics/btp030 



448 



Standards in Genomic Sciences 



